Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms
نویسندگان
چکیده
In this paper we describe our approach to the Ad Hoc Retrieval task of the TREC 2004 Genomics Track. This is a conventional searching task based on a 10-year subset of MEDLINE (about 4.5 million documents and 9 gigabytes in size) and 50 topics derived from information needs obtained via interviews of real biomedical researchers. We will also discuss the results of our submitted runs. The hypothesis we want to test is whether the performance on this particular retrieval task can be improved by expanding queries with synonyms of the original query terms. We use the UMLS Metathesaurus, a comprehensive collection of controlled vocabularies in the biomedical domain, to identify query terms in topics and to determine their synonyms. Our approach is simple in the sense that we only consider synonyms of query terms and do not exploit hierarchical relations between terms such as hyponomy and hyperonymy. Synonymy-based query expansion generally increases recall, but decreases precision due to ambiguous terms. Word senses of ambiguous terms which are inappropriate with regard to the topic under consideration give rise to “polluting” synonyms. We hope that the use of a specifically biomedical term resource such as UMLS will limit the negative effects synonymy-based query expansion may have on precision.
منابع مشابه
York University at TREC 2004: HARD and Genomics Tracks
York University participated in HARD and Genomics tracks this year. For both tracks, we used Okapi BSS (basic search system) as the basis. Our experiments mainly focused on exploiting various methods for combining document and passage scores, new term weighting formulae and feedback methods for query expansion. For HARD track, we built two levels of indexes, and search against both indexes for ...
متن کاملWIDIT in TREC 2004 Genomics, Hard, Robust and Web Tracks
To facilitate understanding of information as well as its discovery, we need to combine the capabilities of the human and the machine as well as multiple methods and sources of evidence. Web Information Discovery Tool (WIDIT) Laboratory at the Indiana University School of Library and Information Science houses several projects that aim to apply this idea of multi-level fusion in the areas of in...
متن کاملExpanding Queries Using Multiple Resources The AID Group at TREC 2006: Genomics Track
We describe our participation in the TREC 2006 Genomics track, in which our main focus was on query expansion. We hypothesized that applying query expansion techniques would help us both to identify and retrieve synonymous terms, and to cope with ambiguity. To this end, we developed several collection-specific as well as web-based strategies. We also performed post-submission experiments, in wh...
متن کاملUsing Concept-Based Indexing to Improve Language Modeling Approach to Genomic IR
Genomic IR, characterized by its highly specific information need, severe synonym and polysemy problem, long term name and rapid growing literature size, is challenging IR community. In this paper, we are focused on addressing the synonym and polysemy issue within the language model framework. Unlike the ways translation model and traditional query expansion techniques approach this issue, we i...
متن کاملTREC 2005 Genomics Track Experiments at UTA
University of Tampere submitted runs for Genomics Track ad hoc retrieval task. The first run (uta05a) was an automatic and the second (uta05i) an interactive run. The uta05a queries were constructed by using the original topic terms as query keys. The uta05a queries served as a baseline for the uta05i queries which were constructed by expanding the uta05a queries with synonyms for the topic gen...
متن کامل